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We propose a test for model specification of a parametric diffu- 
sion process based on a kernel estimation of the transitional density of 
the process. The empirical likelihood is used to formulate a statistic, 
for each kernel smoothing bandwidth, which is effectively a Studen- 
tized Z/2-distance between the kernel transitional density estimator 
and the parametric transitional density implied by the parametric 
process. To reduce the sensitivity of the test on smoothing band- 
width choice, the final test statistic is constructed by combining the 
empirical likelihood statistics over a set of smoothing bandwidths. 
To better capture the finite sample distribution of the test statistic 
and data dependence, the critical value of the test is obtained by 
a parametric bootstrap procedure. Properties of the test are evalu- 
ated asymptotically and numerically by simulation and by a real data 
example. 

1. Introduction. Let X\ , X n+ i be n + 1 equally spaced (with spacing 
A in time) observations of a diffusion process 



where /i(-) and cr 2 (-) > are, respectively, the drift and diffusion functions, 
and Bt is the standard Brownian motion. Suppose a parametric specification 
of model (1.1) is 



Received November 2005; revised February 2007. 

1 Supported by the National University of Singapore Academic research grant. 
Supported by the NSF Grants DMS-06-04563 and SES-05-f 8904. 
' ! Supported by the Australian Research Council Discovery Grant. 
AMS 2000 subject classifications. Primary 62G05; secondary 62J02. 
Key words and phrases. Bootstrap, diffusion process, empirical likelihood, goodness-of- 
fit test, time series, transitional density. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2008, Vol. 36, No. 1, 167-198. This reprint differs from the original in pagination 

and typographic detail. 




dX t = fi(X t )dt + a{X t )dB t 



(1.2) 
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where 6 is a parameter within a parameter space O C R for a positive 
integer d. The focus of this paper is on testing the validity of the parametric 
specification (1.2) based on a set of discretely observed data {XtA}^ ■ 

In a pioneer work that represents a break-through in financial economet- 
rics, A'it-Sahalia [1] considered two approaches for testing the parametric 
specification (1.2). The first one was based on a L2-distance between a ker- 
nel stationary density estimator and the parametric stationary density im- 
plied by model (1.2) with the critical value of the test obtained from the 
asymptotic normal distribution of the test statistic. The advantage of the 
test is that the parametric stationary density is easily derivable for almost 
all processes and performing the test is straightforward. There are several 
limitations with the test. One is, as pointed out by A'it-Sahalia [1], that a 
test that targets on the stationary distribution is not conclusive, as different 
processes may share a common stationary distribution. Another is that it 
can take a long time for a process to produce a sample path that contains 
enough information for accurate estimation of the stationary distribution. 
These were confirmed by Pritsker [42] who reported noticeable discrepancy 
between the simulated and nominal sizes of the test under a set of Vasicek 
[47] diffusion processes. In the same paper A'it-Sahalia considered another 
approach based on certain discrepancy measure regarding the transitional 
distribution of the process derived from the Kolmogorov-backward and back- 
ward equations. The key advantage of a test that targets on the transitional 
density is that it is conclusive as transitional density fully specifies the dy- 
namics of a diffusion process due to its Markovian property. 

In this paper we propose a test that is focused on the specification of 
the transitional density of a process. The basic building blocks used in con- 
structing the test statistic are the kernel estimator of the transitional density 
function and the empirical likelihood (Owen [41]). We first formulate an in- 
tegrated empirical likelihood ratio statistic for each smoothing bandwidth 
used in the kernel estimator, which is effectively a ^-distance between the 
kernel transitional density estimator and the parametric transitional den- 
sity implied by the process. The use of the empirical likelihood allows the 
L2-distance being standardized by the variation. We also implement a series 
of measures to make the test work more efficiently. This includes properly 
smoothing the parametric transitional density so as to cancel out the bias in- 
duced by the kernel estimation, which avoids undersmoothing and simplifies 
theoretical analysis. To make the test robust against the choice of smooth- 
ing bandwidth, the test statistic is formulated based on a set of bandwidths. 
Finally, a parametric bootstrap procedure is employed to obtain the critical 
value of the test, and to better capture the finite sample distribution of the 
test statistic and data dependence induced by the stochastic process. 

A continuous-time diffusion process and discrete-time time series share 
some important features. They can be both Markovian and weakly depen- 
dent satisfying certain mixing condition. The test proposed in this paper 
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draws experiences from research works on kernel based testing and estima- 
tion of discrete time models established in the last decade or so. Kernel- 
based tests have been shown to be effective in testing discrete time series 
models as demonstrated in Robinson [44], Fan and Li [24], Hjellvik, Yao and 
Tj0stheim [30], Li [38], A'it-Sahalia, Bickel and Stoker [4], Gozalo and Linton 
[27] and Chen, Hardle and Li [12]; see Hart [29] and Fan and Yao [18] for 
extended reviews and lists of references. For kernel estimation of diffusion 
processes, in addition to Ait-Sahalia [1], Jiang and Knight [35] proposed a 
semiparametric kernel estimator; Fan and Zhang [21] examined the effects 
of high order stochastic expansions and proposed separate generalized like- 
lihood ratio tests for the drift and diffusion functions; Bandi and Phillips 
[7] considered a two-stage kernel estimation without the strictly stationary 
assumption. See Cai and Hong [9] and Fan [15] for comprehensive reviews. 

In an important development after A'it-Sahalia [1], Hong and Li [32] de- 
veloped a test for diffusion processes via a conditional probability integral 
transformation. The test statistic is based on a ^-distance between the 
kernel density estimator of the transformed data and the uniform density 
implied under the hypothesized model. Although the kernel estimator is 
employed, the transformation leads to asymptotically independent uniform 
random variables under the hypothesized model. Hence, the issue of model- 
ing data dependence induced by diffusion processes is avoided. In a recent 
important development, A'it-Sahalia, Fan and Peng [5] proposed a test for 
the transitional densities of diffusion and jump diffusion processes based on 
a generalized likelihood ratio statistic, which, like our current proposal, is 
able to fully test diffusion process specification and has attractive power 
properties. 

The paper is structured as follows. Section 2 outlines the hypotheses and 
the kernel smoothing of transitional densities. The proposed EL test is given 
in Section 3. Section 4 reports the main results of the test. Section 5 con- 
siders computational issues. Results from simulation studies are reported in 
Section 6. A Federal fund rate data set is analyzed in Section 7. All technical 
details are given in the Appendix. 

2. The hypotheses and kernel estimators. Let ir(x) be the stationary 
density and p(y\x; A) be the transitional density of -X"(t+i)A = y given X t A = 
x under model (1.1), respectively; and irg(x) and po(y\x,A) be their para- 
metric counterparts under model (1.2). To simplify notation, we suppress 
A in the notation of transitional densities and write {XtA} as {Xt} for the 
observed data. Let X be the state space of the process. 

Although ng(x) has a close form expression via Kolmogorov forward equa- 
tion 
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where £(9) is a normalizing constant, pe(y\x) as denned by the Kolmogorov- 
backward equation may not admit a close form expression. However, this 
problem is overcome by Edgeworth type approximations developed by A'it- 
Sahalia [2, 3]. As the transitional density fully describes the dynamics of a 
diffusion process, the hypotheses we would like to test are 

Hq :p{y\x) =pe Q {y\x) for some 9q £ and all (x,y) S S C X 2 versus 
Hi :p(y\x) / Pe(y\x) for all 9 G 6 and some (x, y) 6 S C X 2 , 

where S is a compact set within X 2 and can be chosen based on the kernel 
transitional density estimator given in (2.1) below; see also demonstrations 
in simulation and case studies in Sections 6 and 7. As we are to properly 
smooth the parametric density pg(y\x), the boundary bias associated with 
the kernel estimators ([16] and [40]) is avoided. 

Let K(-) be a kernel function which is a symmetric probability density 
function, h be a smoothing bandwidth such that h — > and nh? — > oo as 
n — ► oo, and Kh(') = h~ l K{- /h). The kernel estimator of p(y\x) is 

n 

(2.1) p(y|x) = n" 1 £ A^(x - A 4 )A h (y - X t+ i)/ir(x), 

t=l 

where %{x) = (n + 1) _1 J2t=i Kh( x — ^t) is the kernel estimator of the sta- 
tionary density used in Ai't-Sahalia [1]. The local polynomial estimator in- 
troduced by Fan, Yao and Tong [19] can also be employed without altering 
the main results of this paper. It is known (Hydman and Yao [34]) that 

*<w»M - ptoW} = W ( + + S (x) dp{y]x) 



2 V dx 2 dy 2 n(x) dx 

+ o(h 2 ), 

Varpfex = -A^il + o 1 , 
nn z TT[x) 

where a 2 K = J u 2 K(u) du and R(K) = J K 2 (u) du. Here we use a single band- 
width h to smooth the bivariate data (Xt,Xt+i). This is based on a consid- 
eration that both Xt and Xt+i are identically distributed and hence have 
the same scale which allows one smoothing bandwidth to smooth for both 
components. Nevertheless, the results in this paper can be generalized to 
the situation where two different bandwidths are employed. 

Let 9 be a consistent estimator of 9 under model (1.2), for instance, the 
maximum likelihood estimator under Hq, and 



(2.2) 
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be the local linear weight with s r h(x) = J21=i Kh(x — X s )(x — X s ) r for r = 
0, 1 and 2. In order to cancel out the bias in p(y\x), we smooth ps(y\x) as 

~ / I \ _ K h (x - X t ) ggj w s (y)p § (X s \X t ) 

( } PMX) ~ ^lK h {x-X t ) • 

Here we apply the kernel smoothing twice: first for each Xt using the local 
linear weight to smooth pg{X s \Xt) and then employing the standard kernel 
to smooth with respect to Xt. This is motivated by Haxdle and Mammen 
[28]. It can be shown from the standard derivations in Fan and Gijbels [16] 
that, under Ho, 

(2.4) E{p(y\x) - p § (y\x)} = o(h 2 ) 
and 

(2.5) Yai{p(y\x) - pg{y\x)} =\ai{p(y\x)}{\ + o(l)} . 

Hence, the biases of p(y\x) and ps(y\x) are canceled out in the leading or- 
der, while smoothing the parametric density does not affect the asymptotic 
variance. 



3. Formulation of test statistic. The test statistic is formulated by the 
empirical likelihood (EL) (Owen [41]). Despite its being intrinsically non- 
parametric, EL possesses two key properties of a parametric likelihood: the 
Wilks' theorem and the Bartlett correction. Qin and Lawless [43] established 
EL for parameters defined by generalized estimating equations which is the 
broadest framework for EL formulation so far, which was extended by Ki- 
tamura [36] to dependent observations. Chen and Cui [10] showed that the 
EL admits Bartlett correction under this general framework. See also Hjort, 
McKeague and Van Keilegom [31] for extensions. The EL has been used for 
goodness-of-fit tests of various model structures. Fan and Zhang [23] pro- 
posed a sieve EL test for a varying-coefncient regression model that extends 
the test of Fan, Zhang and Zhang [22]; Tripathi and Kitamura [46] studied a 
test for conditional moment restrictions; Chen, Haxdle and Li [12] proposed 
an EL test for time series regression models. See also [37] for survival data. 

We now formulate the EL for the transitional density at a fixed (x,y). 
For t = 1, .. .,n, let qt(x,y) be nonnegative weights allocated to (Xt,Xt+i). 
The EL evaluated at pg(y\x) is 

n 

(3.1) L{pg(y\x)} = maxY[q t (x,y) 

t=i 

subject to J2t=i Qt(x,y) = 1 and 
n 

(3.2) Qt(x,y)K h (x - X t )K h (y - X t+X ) = p § (y\x)Tt(x). 
t=i 
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By introducing a Lagrange multiplier X(x,y), the optimal weights as so- 
lutions to (3.1) and (3.2) are 

(3.3) q t (x,y)=n- 1 {l + \(x,y)T t {x,y)}- 1 , 

where T t (x,y) = K h (x - X t )K h (y - X t +i) -p§(x,y) and X(x,y) is the root 
of 



T t {x,y) 
. + X(x,y)T t (x,y) 



( 3 - 4 ) E — 



The overall maximum EL is achieved at qt{x,y) = n 1 which maximizes 
(3.1) without constraint (3.2). Hence, the log-EL ratio is 

e{l>M x )} = -2log([L{p § (y\x)}n n ]) 

(3.5) 

= 2j2log{l + \(x,y)T t (x,y)}. 

It may be shown by similar derivations to those in Chen, Hardle and Li [12] 
that 

(3.6) sup \\{x,y)\=o p {(nh 2 r 1 ' 2 \og(n)}. 

Let U 1 (x,y) = (nh 2 )- l Y J Tt(x,y)jMi& U 2 {x,y) = (n/i 2 )- 1 £T f 2 (z,y). From 
(3.4) and (3.6), X(x,y) = U 1 (x,y)U 2 ~ 1 (x, y) + O p {(nh 2 )- 1 log 2 (n)} uniformly 
with respect to (x,y) € S. This leads to 

t{P8(v\z)} = nh 2 U 2 (x,y)U2 1 (x,y) -f O^rT^-i/a log 3 (n)} 

(3.7) 

u2 iP(y\ x ) -PM X )} 2 i n r ^2 , -1/2,-1/21 3/ u 
= nh — , ^O p {h A + n 1/z h 1/z log°(n)j 

V{y\x) 

uniformly for (x,y) £ S, where V(y\x) = B?{K)p{y\x)ir~ 1 {x). Hence, the 
EL ratio is a Studentized local goodness-of-fit measure between p(y\x) and 
pg(y\x) as Var{p(y\x)} = (nh 2 )~ 1 V(y\x). 

Integrating the EL ratio against a weight function uj(-, •) supported on S, 
the global goodness-of-fit measure based on a single bandwidth is 

(3.8) N(h) = JI £{p § (y\x)}u;(x,y)dxdy. 

To make the test less dependent on a single bandwidth h, we compute 
N(h) over a bandwidth set 7i = {hk}{ = i, where hk/hk+i = a for some a £ 
(0,1). The choice of Ti. can be guided by the cross-validation method of 
Fan and Yim [20] or other bandwidth selection methods; see Section 5 for 
more discussions and demonstration. This formulation is motivated by Fan 
[14] and Horowitz and Spokoiny [33], both considered achieving the optimal 
convergence rate for the distance between a null hypothesis and a series 
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of local alternative hypotheses in testing regression models. To our best 
knowledge, Fan [14] was the first to propose the adaptive test and showed 
its oracle property. The adaptive result is more explicitly given in Fan and 
Huang [17]. Fan, Zhang and Zhang [22] also explicitly adapted the multi- 
frequency test of Fan [14] into the multi-scale test and obtained the adaptive 
minimax result. As we are concerned with testing against a fixed alternative 
only, it is adequate to have a finite number of bandwidths in Ti in our 
context. 

The final test statistic based on the bandwidth set Ti is 

/q Q\ J N(h k ) ~ 1 

(o.9 L n = max -= , 

V ; i<k<J .J2h k 

where the standardization reflects that Var{iV(/i)} = 0(2h 2 ) as shown in the 
Appendix. 



4. Main results. Our theoretical results are based on the following as- 
sumptions. 



Assumption 1. (i) The process {Xt} is strictly stationary and a-mixing 
with mixing coefficient a(t) < C a a l , where 

a(t) = sup{|P(A nB)- P{A)P{B)\ :AeVL{,Be Uf +t } 

for all s,t > 1, where C a is a finite positive constant, denotes the u-field 
generated by {Xt :i <t< j}, and a is a constant in (0, 1). 

(ii) K(-) is a bounded symmetric probability density supported on [—1, 1] 
and has bounded second derivative; and u>(x,y) is a bounded probability 
density supported on S. 

(iii) For the bandwidth set 7i, h\ = ci?i -71 and hj = cjn~' y2 , in which 
7 < 72 < 7i < j, c\ and cj are constants satisfying 0< c±,cj < oo, and J is 
a positive integer not depending on n. 



Assumption 2. (i) Each of the diffusion processes given in (1.1) and 
(1.2) admits a unique weak solution and possesses a transitional density 
with p(y\x) = p(y\x, A) for (1.1) and Pe(y\x) =Pe(y\x, A) for (1.2). 

(ii) Let p Sl ,s2,—,si (•) be the joint probability density of (Xi +Sl , . . . , Xi +Sl ). 
Assume that each p Sl ,T 2 ,...,s l {x) is three times differentiable in x G X 1 for 
l</<6. 

(iii) The parameter space is an open subset of R d and pe(y\x) is three 
times differentiable in 6 G 0. For every 9 G 0, /i(x; 9) and a 2 (x; 9), and fj,(x) 
and cr 2 (x) are all three times continuously differentiable in x G X , and both 
a(x) and a(x; 9) are positive for x G S and 9 G 0. 
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Assumption 3. (i) E[( 9pe(x ^ Xt) )( dpB(x ^ Xt) ) T ] is of full rank. Let 
G{x, y) be a positive and integrable function with I2[maxi< t < n G(X t , X t+ i)] < 
oo uniformly in n > 1 such that sup ege |pe(y|a;)| 2 < G(x,y) and sup^Q || 
Pe(y\x)\\ 2 < G(x,y) for all (x,y) G S and j = 1,2,3, where S7oP$(-\-) = 9pe d e ' 

VePovV) — (gey* ancL VeP9v\ ) — (ae) 3 • 

(ii) p(y\x) > ci > for all (x,y) G S* and the stationary density n(x) > 
C2 > for all x £ S x which is the projection of S on X. 

Assumption 4. Under either Hq or H\, there is a #* G O and a sequence 
of positive constants {a n } that diverges to infinity such that, for any e > 
and some C > 0, lim^oo P(a n ||# — > C) < e and ^ha" 1 = o(Vh) for 
any h^TC. 

Assumption l(i) imposes the strict stationarity and a-mixing condition 
on {Xt}. Under certain conditions, such as Assumption A2 of Ait-Sahalia 
[1] and Conditions (A4) and (A5) of Genon-Catalot, Jeantheau and Laredo 
[26], Assumption l(i) holds. Assumption 1 (ii) and (iii) are quite standard 
conditions imposed on the kernel and the bandwidth in kernel estimation. 
Assumption 2 is needed to ensure the existence and uniqueness of a solution 
and the transitional density function of the diffusion process. Such an as- 
sumption may be implied under Assumptions 1-3 of Ait-Sahalia [3], which 
also cover nonstationary cases. For the stationary case, Assumptions AO and 
Al of A'it-Sahalia [1] ensure the existence and uniqueness of a stationary so- 
lution of the diffusion process. Assumption 3 imposes additional conditions 
to ensure the smoothness of the transitional density and the identifiability 
of the parametric transitional density. The 6* in Assumption 4 is the true 
parameter #o under Hq. When Hi is true, 6* can be regarded as a projection 
of the parameter estimator 6 onto the null parameter space. Assumption 4 
also requires that a n , the rate convergence of 6 to 9*, is faster than y/nh, 
the convergence rate for the kernel transitional density estimation. This is 
certainly satisfied when 8 converges at the rate of y/n as attained by the 
maximum likelihood estimation. Our use of the general convergence rate a n 
for the parameter estimation is to cover situations where the parameter esti- 
mator has a slower rate than y/n, for instance, when estimation is based on 
certain forms of discretization which requires A — > in order to be consistent 
(Lo [39]). 

Let K( 2 \z,c) = J K(u)K(z + cu) du, a generalization to the convolution 
of K, u(t) = J{K^{tu,t)} 2 duf{K( 2 \v,t)} 2 dv and 



R\K) 



io 2 (x,y)dxdy(v(a l j ))j x j 
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be a J x J matrix, where a is the fixed factor used in the construction of 7i. 
Furthermore, let lj be a J-dimensional vector of ones and j3 = ^rjc) II ^(y) u3 ^ x ' 1 
y) dx dy. 

Theorem 1. Under Assumptions 1~4 and Hq, L n maxi<fc< j Z^ as 
n — > oo where Z = (Z\, . . . , Zj) T ~ N(f31j, £ j). 

Theorem 1 brings a little surprise in that the mean of Z is nonzero. This 
is because, although the variance of pg{x,y) is at a smaller order than that 
of p(x,y), it contributes to the second-order mean of N(h) which emerges 
after dividing \/2h in (3.9). However, this does not affect L n being a test 
statistic. 

We are reluctant to formulate a test based on Theorem 1 as the conver- 
gence would be slow. Instead, we propose the following parametric bootstrap 
procedure to approximate l a , the 1 — a quantile of L n for a nominal signifi- 
cance level a G (0, 1): 

Step 1. Generate an initial value Xq from the estimated stationary 
density vrg(-). Then simulate a sample path {X^}1^ at the same sampling 
interval A according to dXt = ji{Xt]9) dt + a{Xt\9) dBf. 

Step 2. Let 6* be the estimate of 9 based on {Xf}^. Compute the 
test statistic L n based on the resampled path and denote it by L* . 

Step 3. For a large positive integer B, repeat Steps 1 and 2 B times 
and obtain after ranking L^* < L^* < • ■ • < L^* . 

Let l* a be the 1 - a quantile of L* satisfying > l* a \{X t }^) = a. A 

Monte Carlo approximation of /* is The proposed test rejects 

H if L n >l* a . 

The following theorem is the bootstrap version of Theorem 1 establishing 
the convergence in joint distribution of the bootstrap version of the test 
statistics. 

Theorem 2. Under Assumptions 1~4> as n ~ > °°> given {Xt}™=i , the 
conditional distribution of L* n converges to the distribution o/maxKKjZj. 
in probability as n— >oo, where (Z\, . . . , Zj) T ~ N((31j, Sj). 

The next theorem shows that the proposed EL test based on the bootstrap 
calibration has correct size asymptotically under Hq and is consistent under 
Hl 

Theorem 3. Under Assumptions 1-4, lim n ^ 00 P(L n > /*) = a under 
Hq; and limj^oo P(L n > /*) = 1 under H\. 
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5. Computation. The computation of the proposed EL test statistic N(h) 
involves first computing the local EL ratio £{pg(y\x)} over a grid of (x,y)- 
points within the set S C X 2 . The number of grid points should be large 
enough to ensure good approximation by the Riemann sum. On top of this 
is the bootstrap procedure that replicates the above computation for a large 
number of times. The most time assuming component of the computation 
is the nonlinear optimization carried out when obtaining the local EL ratio 
£{pg(y\x)} . The combination of the EL and bootstrap makes the computa- 
tion intensive. 

In the following, we consider using a simpler version of the EL, the least 
squares empirical likelihood (LSEL), to formulate the test statistic. The log 
LSEL ratio evaluated at pg(y\x) is 

n 

Z lS {Pe(y\ x )} = mmY / {nqt(x,y) - l} 2 
t=l 

subject to J2t=iQt{x,y) = 1 and J2t=i Qt(x, y)T t (x, y) =0. Let T(x,y) = 
Y!,t=i Tt(x, y) and S(x, y) = J2t=i ^~i 2 ( x > V)- The LSEL is much easier to com- 
pute as there are closed- form solutions for the weights qt(x,y) and hence 
avoids the expensive nonlinear optimization of the EL computation. Accord- 
ing to Brown and Chen [8], the LSEL weights qt(x, y) = n^ 1 + {n~ 1 T(x, y) — 
T t {x,y)} T S~ 1 (x)T(x,y) and 

£ ls {rh § (x)} = S~\x,y)T 2 (x,y), 

which is readily computable. The LSEL counterpart to N(h) is 

N ls (h)= J J £ ls {m § (x,y)}u;(x,y)dxdy 

and the final test statistic L n becomes max/ ie -^(v / 2/i) _1 {A^' s (/i) — 1}. It can 
be shown from Brown and Chen [8] that N ls (h) and N(h) are equivalent to 
the first order. Therefore, those first-order results in Theorems 1, 2 and 3 
continue to hold for the LSEL formulation. One may just use this less expen- 
sive LSEL to carry out the testing. In fact, the least squares EL formulation 
was used in all the simulation studies reported in the next section. 

One may use the leading order term in the expansion of the EL test 
statistic as given in (3.7) as the local test statistic. However, doing so would 
require estimation of secondary "parameters," like p(y\x) and ir(x). The use 
of LSEL or EL avoids the secondary estimation as they Studentize automat- 
ically via their respective optimization procedures. 

6. Simulation studies. We report results of simulation studies which were 
designed to evaluate the empirical performance of the proposed EL test. To 
gain information on its relative performance, Hong and Li's test is performed 
for the same simulation. 
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Throughout the paper, the biweight kernel K(u) = y|(l — v?) 2 I(\u\ < 
1) was used in all the kernel estimation. In the simulation, we set A = 
j2, implying monthly observations which coincide with that of the Federal 
fund rate data to be analyzed. We chose n = 125, 250 or 500 respectively 
corresponding roughly to 10 to 40 years of data. The number of simulations 
was 500 and the number of bootstrap resample paths was B = 250. 

6.1. Size evaluation. Two simulation studies were carried out to evaluate 
the size of the proposed test for both the Vasicek and Cox, Ingersoll and 
Ross (CIR) [13] diffusion models. 

6.1.1. Vasicek models. We first consider testing Vasicek model 

dX t = n(a - X t ) dt + a dB t . 

The vector of parameters 9 = (a, k, a 2 ) takes three sets of values which cor- 
respond to Model -2, Model and Model 2 of Pritsker [42]. The base- 
line Model assigns k = 0.85837, a = 0.089102 and erg = 0.0021854 which 
matches estimates of A'it-Sahalia [1] for an interest rate data. Model —2 
is obtained by quadrupling feg an d ctq and Model 2 by halving ko and o~q 
twice while keeping ao unchanged. The three models have the same marginal 

2 

distribution iV(ao,Vs), where Ve = =0.001226. Despite the stationary 
distribution being the same, the models offer different levels of dependence 
as quantified by the mean-reverting parameter k. From Models —2 to 2, the 
process becomes more dependent as k gets smaller. 

The region S was chosen based on the underlying transitional density so 
that the region attained more than 90% of the probability. This is con- 
sistent with our earlier recommendation to choose S based on the ker- 
nel estimate of the transitional density. In particular, for Models —2, 
and 2, it was chosen by rotating respectively [0.035,0.25] x [—0.03,0.03], 
[0.03,0.22] x [-0.02,0.02] and [0.02,0.22] x [-0.009,0.009] 45 degrees anti 
clock-wise. The weight function uj(x,y) = {S^ 1 1{(x,y) £ S}, where \S\ is 
the area of S. 

Both the cross-validation (CV) and the reference to a bivariate normal 
distribution (the Scott Rule, Scott [45]) method were used to select the 
bandwidth set Ti.. A table in a full report to this paper (Chen, Gao and 
Tang [11]) reports the average bandwidths obtained by the two methods. 
We observed that, for each given n, regardless of which method was used, 
the chosen bandwidth became smaller as the model was shifted from Model 
—2 to Model 2. This indicated that both methods took into account the 
changing level of dependence induced by these models. We considered two 
methods in choosing the bandwidth set. One was to choose six bandwidths 
for each combination of model and sample size that contained the average 
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h cv within the lower range of 7i. The second approach was to select h re i 
given the Scott Rule for each sample and then choose other /i-values in the 
set by setting a = 0.95 so that /i re f is the third smallest bandwidth in the 
set of six. This second approach could be regarded as data-driven as it was 
different from sample to sample. 

The maximum likelihood estimator was used to estimate 9 in each simu- 
lation and each resample in the bootstrap. Again, a table in Chen, Gao and 
Tang [11] summarizes the quality of the parameter estimation, which showed 
that the estimation of k was subject to severe bias when the mean-reversion 
is weak. The deterioration in the quality of the estimates, especially for k 
when the dependence became stronger, was quite alarming. 

The average sizes of the proposed test at the nominal size of 5% using 
the two bandwidth set selection rules are reported in Table 1. It shows 
that the sizes of the proposed test using the proposed two bandwidth set 
selections were quite close to the nominal level consistently for the sam- 
ple sizes considered. For Model 2, which has the weakest mean-reversion, 
there was some size distortion when n = 125 for the fix bandwidth selection. 
However, it was significantly alleviated when n was increased. The message 
conveyed by Table 1 is that we need not have a large number of years of 
data in order to achieve a reasonable size for the test. Table 1 also reports 
the single-bandwidth based test based on N(h) and the asymptotic normal- 
ity as conveyed by Theorem 1 with J = 1. However, the asymptotic test 
has severe size distortion and highlights the need for the bootstrap proce- 
dure. 

We then carried out simulation for the test of Hong and Li [32] . The Scott 
Rule adopted by Hong and Li was used to get an initial bandwidth h scott = 
S 2 n _1//6 , where S z is the sample standard deviation of the transformed series. 
There was little difference in the average of /i sco tt among the three Vasicek 
models in the simulation. We used the one corresponding to Vasicek —2. We 
then chose 2 equally spaced bandwidths below and above the average h sco tt- 
The nominal 5% test at each bandwidth was carried out with the lag value 
1. For the sample sizes considered, the sizes of the test did not settle well 
at the nominal level, similar to what happened for the asymptotic test as 
reported in Table 1. We then carried out the proposed parametric bootstrap 
procedure for Hong and Li's test. As shown in Table 2, the bootstrap largely 
improved the size of the test. 

6.1.2. CIR models. We then conduct simulation on the CIR process 
(6.1) dX t = k{ol- X t )dt + aVY t dB t 

to see if the pattern of results observed for the Vasicek models holds for the 
CIR models. The parameters were the following: k = 0.89218, a = 0.09045 
and a 2 = 0.032742 in the first model (CIR 0); k = 0.44609, a = 0.09045 
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Table 1 

Empirical sizes (in percentage) of the proposed EL test (the last two columns) and the 
single bandwidth based test (in the middle) for the Vasicek models, as well as those of the 
single bandwidth test based on the asymptotic normality (in round bracket): ai — size 
based on the fixed bandwidth set; — size based one the data-driven bandwidths 



Bandwidths ot\ a? 









A: Model -2 










71 — 1ZO 


u.uou 


u.uoz 


0.034 


0.036 


U.UooO 


U.U41 






oize 


Q A 

y .4 


8 9 
O.Z 


5.2 


4.6 


Q 
O 


9 1 


1 A 


7 R 
i .0 




l /in /i\ 
(4U.4) 


( OQ Q\ 


(34.8) 


(34.8) 


(OA Q\ 

(64.2) 


(OA Q\ 

(34.8) 






n — 250 


0.022 


0.023 


0.024 


0.026 


0269 


0.0284 






oize 


Q 
O 


-..) . z 


4.6 


4.4 


■J. 4 




4.U 


D 




('iA A \ 


/no c\ 
(ZO.D ) 


(24) 


(21) 


(17.4) 


(16) 






n = 500 


0.02 


0.021 


0.022 


0.023 


0.0245 


0958 

\J .Ui JO 








o.z 


•.) . o 


5.4 


5.2 


r. 
O 


•J 


0.4 


0.0 






^zo.D ) 


(19) 


(14.6) 


(1U.S) 


(».4) 












B: Model 










n= 125 


0.016 


0.017 


0.019 


0.020 


0.022 


0.024 






Size 


5.8 


6 


6 


4.2 


4.4 


3 


4.2 


7 




(43) 


(39) 


(36.6) 


(34.8) 


(36.6) 


(37) 






n = 250 


0.014 


0.015 


0.017 


0.018 


0.02 


0.022 






Size 


6 


6.2 


6.2 


3.8 


2.4 


2.8 


5.2 


5.8 




(31.6) 


(27) 


(20.6) 


(20) 


(17.8) 


(17.8) 






n = 500 


0.01 


0.011 


0.012 


0.013 


0.015 


0.016 






Size 


6.8 


4.4 


5.2 


6.4 


5.6 


4 


5.4 


6.2 




(36.4) 


(26.8) 


(20.6) 


(13) 


(11.2) 


(9.2) 












C: Model 2 










n= 125 


0.008 


0.009 


0.010 


0.011 


0.013 


0.014 






Size 


12.6 


11 


10 


14.6 


14.4 


13.6 


12.6 


3.4 




(60) 


(53.4) 


(47.2) 


(46) 


(45) 


(42.2) 






n = 250 


0.006 


0.007 


0.008 


0.009 


0.01 


0.011 






Size 


12.2 


10 


7.4 


8.8 


7 


11 


8.8 


4.2 




(39) 


(35) 


(31) 


(30) 


(31) 


(33.2) 






n = 500 


0.004 


0.005 


0.0054 


0.0063 


0.0074 


0.0086 






Size 


8.2 


8.4 


8 


8.6 


7 


9 


7.2 


5.6 




(75.2) 


(63) 


(51.6) 


(39.8) 


(32.8) 


(24.4) 







and a 2 = 0.016371 in model CIR 1 and k = 0.22305, a = 0.09045 and a 2 = 
0.008186 in model CIR 2. CIR was the model used in Pritsker [42] for power 
evaluation. The region S was chosen by rotating 45-degrees anti-clockwise 
[0.015, 0.25] x [-0.015, 0.015] for CIR 0, [0.015, 0.25] x [-0.012, 0.012] for CIR 
1 and [0.015,0.25] x [-0.008,0.008] for CIR 2, respectively. All the regions 
have a coverage probability of at least 0.90. 

Table 3 reports the sizes of the proposed test based on two bandwidth 
sets, as well as the single bandwidth-based tests that involve the bootstrap 
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T 1 A DT f 








Asymptotic and bootstrap (in parenth 


P QP Q 1 Q1 yp Q 
C-jCj J J li/C/C-J 


n f TJ pi Ti n nil I "j i ' 


s test for the 


Vasicek 








TP) Pi PI PI Q 








n= 125 




0.08 


0.1 


0.12 


0.14 


0.16 


Vasicek —2 




11.8 (6.2) 


3.8 (5.8) 


1.2 (5.2) 


0.6 (4.8) 


0.6 (4.4) 


Vasicek 




13.2 (4.6) 


4.2 (4.2) 


1.2 (3.8) 


0.8 (3.4) 


0.8 (3.4) 


Vasicek 2 




11.8 (4.4) 


4.2 (3.2) 


2 (3.2) 


1.4 (2.8) 


1.6 (2.8) 


ti = 250 


ft: 


0.07 


0.09 


0.11 


0.13 


0.15 


Vasicek —2 




15 (6.2) 


6.2 (5.4) 


2.2 (6) 


1.6 (5.2) 


1 (6.4) 


Vasicek 




13 (4.8) 


5.8 (5.4) 


3.4 (5.4) 


1.8 (5.8) 


2 (6.6) 


Vasicek 2 




15 (5.4) 


7.4 (5.4) 


3 (5.8) 


1.6 (6.6) 


1.4(5.4) 


n = 500 


ft: 


0.06 


0.08 


0.1 


0.12 


0.14 


Vasicek —2 




15.6 (5.6) 


7.6 (5.6) 


2.8 (4.6) 


2.4 (4.8) 


1.2 (6.2) 


Vasicek 




19.4 (5.4) 


8.4 (5.6) 


3.4 (5.4) 


2.6 (6.2) 


1.6(6.6) 


Vasicek 2 




17 (6.6) 


9 (5.8) 


4.6 (5.4) 


3.2 (6.6) 


2.4 (7.6) 



Table 3 

Empirical sizes (in percentage) of the proposed EL test (the last two columns) and the 
single bandwidth based test (in the middle) for the CIR models: ai — size based on the 
fixed bandwidth set; cx2 — size based one the data-driven bandwidths 



Bandwidths a± a? 









A: CIR 













n= 125 


0.022 


0.025 


0.029 


0.033 


0.038 


0.044 






Size 


4.8 


4.8 


4.8 


3.2 


2.4 


2 


3.0 


6.6 


n = 250 


0.018 


0.021 


0.024 


0.028 


0.032 


0.037 






Size 


5.2 


5.6 


5.2 


5.2 


4 


3.8 


5.0 


6 


n = 500 


0.016 


0.018 


0.021 


0.024 


0.027 


0.031 






Size 


4.2 


5.4 


4.6 


4.8 


3.6 


4.2 


4.8 


5.6 








B: CIR 


1 










7i= 125 


0.017 


0.02 


0.022 


0.026 


0.03 


0.035 






Size 


5.4 


3.6 


4.0 


4.2 


3.2 


2.6 


3.8 


7.6 


71 = 250 


0.014 


0.016 


0.018 


0.021 


0.024 


0.028 






Size 


5.2 


6.6 


5.6 


5.6 


5.6 


3.4 


5.2 


5.6 


n = 500 


0.012 


0.014 


0.016 


0.018 


0.021 


0.024 






Size 


5.4 


3.8 


4.4 


4.4 


4.6 


4 


5.2 


5.2 








C: CIR 


2 










n= 125 


0.012 


0.014 


0.016 


0.018 


0.021 


0.024 






Size 


7.6 


7.2 


7.6 


6.2 


6.6 


4.2 


6.8 


8.4 


71 = 250 


0.01 


0.012 


0.013 


0.015 


0.017 


0.02 






Size 


5.6 


6.4 


5.8 


6.8 


6.2 


5.4 


6.2 


7.6 


n = 500 


0.008 


0.009 


0.011 


0.012 


0.014 


0.016 






Size 


4 


4.2 


3.6 


4 


4.8 


3.4 


4 


6.6 
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Table 4A 

Empirical power (in percentage) of the proposed EL test (last two columns) and the 
single bandwidth based test: ot\ — power based on the fixed bandwidth set; a.2 — power based 

on the data-driven bandwidths 



n 






Single 


bandwidth-based tests 




Ctl 


a 2 


125 


h 


0.0199 


0.0219 


0.0241 


0.0265 


0.0291 








Power 


80.4 


74 


67.2 


66.4 


65.2 


79.8 


63.6 


250 


h 


0.0141 


0.0158 


0.0177 


0.0199 


0.0223 








Power 


87.6 


81.2 


76.4 


74 


72.8 


88.6 


65.2 


500 


h 


0.0113 


0.0126 


0.0141 


0.0157 


0.0175 








Power 


90.8 


84.8 


82.8 


84.4 


80.8 


96.8 


81.4 



simulation. The bandwidth sets were chosen based on the same principle 
as outlined for the Vasicek models and are reported in the table. We find 
that the proposed test continued to have reasonable size for the three CIR 
models despite that there were severe biases in the estimation of k. The size 
of the single bandwidth based tests as well as the overall test were quite 
respectable for n = 125. It is interesting to see that despite k still being 
poorly estimated for the CIR 2, the severe size distortion observed earlier 
for Vasicek 2 for the fixed bandwidth set was not present. Hong and Li's test 
was also performed for the three CIR models. The performance was similar 
to that of the Vasicek models reported in Table 2 and, hence, we would not 
report here. 

6.2. Power evaluation. To gain information on the power of the proposed 
test, we carried out simulation to test for the Vasicek model, while the real 
process was the CIR as in Pritsker's power evaluation of A'it-Sahalia's test. 
The region S was obtained by rotating [0.015,0.25] x [-0.015,0.015] 45 de- 
grees anti-clock wise. The average CV bandwidths based on 500 simulations 
were 0.0202 (the standard error of 0.0045) for n = 125, 0.016991 (0.00278) 
for n = 250 and 0.014651 (0.00203) for n = 500. 



Table 4B 





Asymptotic 


and bootstrap power (in 


round brackets) 


of Hong and Li 


's test 


n 






Asymptotic power (bootstrap power) 




125 


h 


0.08 


0.1 


0.12 


0.14 


0.16 




Power 


26 (4.6) 


16.4(3.8 


) 9.6 (2.8) 


5.8 (3) 


4.8 (3.6) 


250 


h 


0.07 


0.09 


0.11 


0.13 


0.15 




Power 


41 (5.8) 


29 (6) 


18.8 (5.4) 


14.6 (7.2) 


10.4 (6.4) 


500 


h 


0.06 


0.08 


0.1 


0.12 


0.14 




Power 


57.4 (6) 


49 (5.4) 


40.2 (5) 


34 (6.2) 


31.2 (7.2) 
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Table 4A reports the power of the EL test and the single bandwidth-based 
tests, including the fixed bandwidth sets used in the simulation. We find the 
tests had quite good power. As expected, the power increased as n increased. 
One striking feature was that the power of the test tends to be larger than 
the maximum power of the single bandwidth-based tests, which indicates 
that it is worthwhile to formulate the test based on a set of bandwidths. 
Table 4B also reports the power of Hong and Li's test. It is found that while 
the bootstrap calibration improved the size of the test, it largely reduced 
the power. The power reduction was quite alarming. In some cases, the test 
had little power. Our simulation results on Hong and Li's test were similar 
to those reported in A'it-Sahalia, Fan and Peng [5]. 

7. Case studies. We apply the proposed test on the Federal fund rate 
data set between January 1963 and December 1998 which has n = 432 obser- 
vations. A'it-Sahalia [2] used this data set to demonstrate the performance 
of the maximum likelihood estimation. We test for five popular one-factor 
diffusion models which have been proposed to model interest rate dynamics. 
In additional to the Vasicek and CIR processes, we consider 

(7.1) dX t = X t {n - (a 2 - Ka)X t } dt + aXf 2 dB t , 

(7.2) dX t = «(q - X t ) dt + aX p t dB t , 

(7.3) dX t = (a-iAV 1 + a + a x X t + a 2 Xf) dt + crX t 3/2 dB t . 

They are respectively the inverse of the CIR process (ICIR) (7.1), the con- 
stant elasticity of the volatility (CEV) model (7.2) and the nonlinear drift 
(NL) model (7.3) of Ait-Sahalia [1]. 

The data are displayed in Figure 1(a), which indicates a strong depen- 
dence as they scattered around a narrow band around the 45-degree line. 
There was an increased volatility when the rate was larger than 12%. The 
model-implied transitional densities under the above five diffusion models 
are displayed in the other panels of Figure 1 using the MLEs given in A'it- 
Sahalia [2], which were also used in the formulation of the proposed test 
statistic. Figure 1 shows that the densities implied by the Inverse CIR, the 
CEV and the nonlinear drift models were similar to each other, and were 
quite different from those of the Vasicek and CIR models. The bandwidths 
prescribed by the Scott rule and the CV for the kernel estimation were re- 
spectively h rc f = 0.007616 and h cv = 0.00129. Plotting the density surfaces 
indicated that a reasonable range for h was from 0.007 to 0.02, which offered 
a lot of smoothness from slightly undersmoothing to slightly oversmoothing. 
This led to a bandwidth set consisting of J = 7 bandwidths with hi = 0.007, 
hj = 0.020 and a = 0.8434. 

Kernel transitional density estimates and the smoothed model-implied 
transitional densities for the five models are plotted in Figure 2 for h = 
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density estimate and smoothed parametric 



0.007. By comparing Figure 2 with Figure 1, we notice the effect of kernel 
smoothing on these model-implied densities. In formulating the final test 
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statistic L n , we chose 

1 n 

(7.4) N(h) = -Y,Z{P~ e {Xt + i\X t )}u 1 {X u X t+1 ), 

n t=x 

where u\ is a uniform weight over a region by rotating [0.005, 0.4] x [—0.03, 0.03] 
45 degrees anti clock-wise. The region contains all the data values of the 
pair (Xt,Xt+\). As seen from (7.4), N(h) is asymptotically equivalent to 
the statistic defined in (3.8) with u(x,y) = p(x,y)ui(x,y). 

The p-values of the proposed tests are reported in Table 5, which were 
obtained based on 500 bootstrap resamples. It shows little empirical sup- 
port for the Vasicek model and quite weak support for the CIR. What was 
surprising is that there was some empirical support for the inverse CIR, the 
CEV and the nonlinear drift models. In particular, for CEV and the nonlin- 
ear drift models, the p- values of the single bandwidth based tests were all 
quite supportive even for small bandwidths. Indeed, by looking at Figure 2, 
we see quite noticeable agreement between the nonparametric kernel den- 
sity estimates and the smoothed densities implied by the CEV and nonlinear 
drift models. 

8. Conclusion. The proposed test shares some similar features with the 
test proposed in Hong and Li [32] . For instance, both are applicable to test 
continuous-time and discrete-time Markov processes by focusing on the spec- 
ification of the transitional density. An advantage of Hong and Li's test is its 
better handling of nonstationary processes. The proposed test is based on 
a direct comparison between the kernel estimate and the smoothed model- 
implied transitional density, whereas Hong and Li's test is an indirect com- 
parison after the probability integral transformation. An advantage of the 
direct approach is its robustness against poor quality parameter estimation 
which is often the case for weak mean-reverting diffusion models. This is 
because both the shape and the orientation of the transitional density are 
much less affected by the poor quality parameter estimation. Another aspect 
is that Hong and Li's test is based on asymptotic normality and can be un- 
der the influence of slow convergence despite the fact that the transformed 
series is asymptotically independent. Indeed, our simulation showed that it 
is necessary to implement the bootstrap procedure for Hong and Li's test. 
The last and the most important aspect is that a test based on the con- 
ditional distribution transformation tends to reduce the power comparing 
with a direct test based on the transitional density. This has been indicated 
by our simulation study, as well as that of Ai't-Sahalia, Fan and Peng [5]. 
Interested readers can read Ai't-Sahalia, Fan and Peng [5] for more insights. 

Our proposed test is formulated for the univariate diffusion process. An 
extension to multivariate diffusion processes can be made by replacing the 
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Table 5 

P-values for the federal fund rate data 



Model 


Vasicek 


CIR 


ICIR 


CEV 


NL 


Test statistics L n 


29.71 


12.80 


66.63 


64.56 


69.10 


Critical value Z0.05 


2.54 


22.27 


303.4 


434.77 


557.52 


p- value 


0.0 


0.142 


0.294 


0.434 


0.422 



univariate kernel smoothing in estimating the transitional density with mul- 
tivariate kernel transitional density estimation. There is no substantial differ- 
ence in the formulation of the EL test statistic and the parametric bootstrap 
procedure. If the exact form of the transitional density is unknown, which is 
more likely for multivariate diffusion processes, the approximate transitional 
density expansion of A'it-Sahalia and Kimmel [6] is needed. 

APPENDIX 

As the Lagrange multiplier \{x,y) is implicitly dependent on h, we need 
first to extend the convergence rate for a single /i-based sup^ y ^ &s X(x, y) 
conveyed in (3.6) to be valid uniformly over 7i. To prove Theorem 1, we 
need the following lemmas first. 

Lemma A.l. Under Assumptions 1~4, 

max sup \(x, y) = o v {n~ x ^ log(n)}. 
h ^ n (x,y)eS 

PROOF. For any 5 > 0, 

P[ max sup h\(x, y) > dn" 1 ^ 2 log(re) I 

\heHn ( x ,y)eS J 

SU P /iA(x,y) > <5n _1/2 log(n) j. 

As the number of bandwidths in H is finite, by checking the relevant 
derivations in Chen, Haxdle and Li [12], it can be shown that 

pI h\(x,y)>5n~^ 2 log(n))^0 

\(x,y)eS J 

as n — > oo. This implies that maxftg^ sup^ ^g^ h\(x, y) = o p {5n~ 1 / 2 log(n)}. 
Then the lemma is established by noting that h\, the smallest bandwidth in 
TC, is of order ra -71 , where 71 G (1/7, 1/4) as assumed in Assumption 1. □ 



TEST FOR DIFFUSION PROCESSES 



21 



Before introducing more lemmas, we present some expansions for the EL 
test statistic N(h). Let 



Pe(x,y) = po{y\x)fc(x) 



and 



n+ 1 



71+1 



p(x,y)=n 1 ^2K h (x- X t )^2w s (y)p(X s \X t 



t=i 



s=l 



be the kernel smoothed versions of the parametric and nonparametric joint 
densities pg(x,y) and p(x,y), respectively. Due to the relationship between 
transitional and joint densities, 

N{h) = {nh &(K) P (x,v) ^ dxd y 

+ O p {h 2 + {nh 2 )- 1 ' 2 log 3 (n)} 

= (nh 2 )R~ 2 (K) 

{p(x,y) -p(x,y)} 2 
p(x,y) 

2{p(x,y) -p(x,y)}{p(x,y) 



(A.l) 



+ 



i(x,y)} 



p(x,y) 
{p(x,y) 



+ 



)(x,y)} 



2 1 



p(x,y) 

x uj(x,y)dxdy + O p {h 2 + (nh 2 )~ l/2 log 3 (n)} 

=: N^h) + N 2§ {h) +N 3§ (h) + O p {h 2 + {nh 2 )^ 2 log 3 (n)}. 

Here and throughout the proofs, o(5 n ) and 0(5 n ) denote stochastic quanti- 
ties which are respectively o{5 n ) and 0{5 n ) uniformly over S for a nonneg- 
ative sequence {5 n }. 

Using Assumptions 3 and 4, we have Nig(h) = Nw*(h) + o p (h), where 
9* = 9q under Hq and Q\ under H\. Thus, 

N{h) = N x (h) + N 2e * (h) + N 3e * (h) + 5p(h) 

+ O p {{nh 2 r 1 l 2 log 3 (n)}. 

We start with some lemmas on p(x,y), p(x,y) and pg(x,y). Let be 
the convolution of K, MK^- 2 \t) = J J uK(u)K(t + u)du and ps(x,y,z) be 
the joint density of (X t ,X t+ i,X t+2 ). 

The following lemmas are presented without proofs. The detailed proofs 
are given in Chen, Gao and Tang [11]. 
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Lemma A. 2. Under Assumptions 1~4, 
Cov{p(s 1 ,t 1 ),p(s 2 ,t 2 )} 

= KW((s 2 - Sl )/h)K^{{s 2 - s 1 )/h)p(s 1 ,t 1 ) 
nh 2 

MKW{(s 2 - 8l )/h) dp( Sl , h)/dx 
nh 



+ 



MK^((t 2 -t 1 )/nh)dp(s 1 ,t 1 )/dy 



nh 



P3(si,h,t 2 )K( 2 \(s 2 - h)/h) + p 3 (s 2 , t 2 , h)K( 2 \( Sl - t 2 )/h) 



nh 



+ o{(nh)' 1 }. 



Lemma A. 3. Suppose that Assumptions 1~4 hold. Let Ag(x,y) = {pe(y\x) 
p(y\x)}ir(x). Then 



(A.2) 



1 f d 2 d 2 1 
E{p e (x,y) -p(x,y)} = A e (x,y) + ^^{Q^ + q~2 j A e( x >v) 



6(h 3 



1 ( d 2 d 2 } 
E{pe(x,y) -p(x,y)} = A e (x,y) + ^^{Q^ + dy 2 j Ae ^ x,y ^ 



(A.3) 



Cov{p(si,t 1 ),p(s 2 ,t 2 )} 



+ 0(h 3 ), 

K^{(t 2 -t 1 )/h)p{s 1 ,t 1 )p(s 2 ,t 1 ) 



(A.4) 



nhir(t 2 



+ 5{{nh)- 1 }. 



Lemma A. 4. Under Assumptions we have 



Cov{p(si,h),p(s 2 ,t 2 )} 



nh7r(t 2 ) 
nhTr(t 2 ) 



t 2 - Sl 



h 



+ 



t 2 - Si 

h 



P(s 2 ,si) 



Lemma A. 5. If Hq is true, then N 2e * (h) = N 3g * (h) = for all he H. 

Proof. Under Hq, p(y\x) = Pe {v\x) and 9* = 6q. Hence, fi(x, y) —fig* (x, y) 
n- 1 J2K h (x-X t )J2ws(y){p(X s \X t )-p eo (X s \X t )} = 0. □ 
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Let us now study the leading term Ni(h). From (A.l) and by hiding the 
variables of integrations, 



(nh 2 



R 2 {K) 
(nh 2 ) 



LO 



R 2 (K)p 

{p - Ep} 2 + {Ep - p} 2 {Ep - Ep} 2 



P 



P 



^ 2{p - Ep}{Ep - Ep} + 2{p - Ep}{Ep - p} 



P 



+ 



2{Ep - Ep}{Ep - p} 



P 



LO 



i=i 

We are to show in the following lemmas that Nn(h) dominates N±(h) 
and Ny(h) for j > 2 are all negligible except N\2(h), which contributes to 
the mean of N\(h) in the second order. 

Lemma A. 6. Under Assumptions 1~4, then uniformly with respect to 

n, 

(A.5) 



h- 1 E{N 11 (h)-l} = o(l), 



(A.6) 



(A.7) 



Var{/T A JVii(7i)} 



2K^(0) 
R\K) 



lo 2 (x, y) dxdy + o(l), 



Cov^r^ii^i),^ 1 ^!!^)} = JJ" 2 (x>y)dxdy 

+ o(l). 



Proof. From Lemma A.l and the fact that MJf( 2 '(0) = 0, 



E{N u (h)} 



nh 



(A.8) 



R 2 (K) 
1 



Vav{p(x,y)} 
p{x,y) 



lj(x, y) dx dy 



(K( 2 \0)) 2 + 2hK^( y -^)^4 



h 



p(x,y) 



x u(x,y) dxdy{l + o(l)} 
= l + 0(h 2 ), 
which leads to (A.5). To derive (A.6), let 

v ' ' v ' R(K)pV 2 (x,t) 
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It may be shown from the fact that K is bounded and an other regularity 
condition assumed that E{\Z n (si,ti)\ 2+e \Z n (s 2 ,t 2 )\ 2+e } < M for some pos- 
itive e and M. And hence, {Z n (s,t)} n >i and {Z 2 (si,ti)Z 2 (s 2 ,t 2 )} n >i are 
uniformly integrable respectively. Also, 

(Z n (s 1 ,ti),Z n (s 2 ,t 2 )) T -> (Z(s 1 ,ti),Z(s2,t 2 )) T , 

which is a bivariate normal process with mean zero and covariance 

1 9{(s 1 ,t 1 ),(s 2 ,t 2 )}" 
g{{s 1 ,t 1 ),(s 2 ,t 2 )} 1 

where g{(si,ti), (s 2 ,t 2 )} = K^{^)K^{^) ^^f^ y Hence, by 
ignoring smaller order terms, 

Var{N u (h)} 

Cov{Z 2 (s 1 ,s 2 ),Z 2 (s 2 ,t 2 )} 
x Lo(si,ti)io(s 2 ,t 2 ) ds\ dt\ ds 2 dt 2 
Cov{Z 2 ( Sl ,s 2 ),Z 2 (s 2 ,t 2 )} 
x uj(si,ti)u>(s 2 ,t 2 ) ds\ dt\ ds 2 dt 2 
Cov 2 {Z(s l ,s 2 ),Z(s 2 ,t 2 )} 
x iu(si,ti)Lu(s 2 ,t 2 ) ds\ dt\ ds 2 dt 2 



(A 



K (2) ( S2-S! \ (2) ( h~tl \ \ 2 Pjs^h) 

\ h J \ h Jj R 2 (K)p(s 2 , 



LU 2 (s,t)TT~ 2 (t)dsdt. 



R A {K)JJJJ\ \ h J \ h J J R 2 (K)p(s 2 ,t 2 ) 
x u)(si,ti)u)(s 2 ,t 2 ) dsidtids 2 dt 2 

_ 2h 2 K^(0) 
R 4 (K) 

In the third equation above, we use a fact regarding the fourth product 
moments of normal random variables. Combining (A. 8) and (A. 9), (A. 5) and 
(A. 6) are derived. It can be checked that it is valid uniformly for all h £7i. 

The proof for (A. 7) follows from that for (A. 6). □ 

The proof of the following lemma is left in Chen, Gao and Tang [11]. 

Lemma A. 7. Under Assumptions 1~4, then uniformly with respect to 
heH, 

,-iAr tu\ - 1 f f p( x >y). 



(A.IO) h -'N 12 {h) = ^j— ) J J ^-JJLioj(x,y)dxdy + o p (l), 
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(A.ll) h- 1 N lj (h)=O p (l) forj>3. 

Let L(h) = c^iNih) - 1} and (3 = JJ ^ u(x, y) dx dy. In 

view of Lemmas A. 5, A. 6 and A. 7, we have, under Hq, uniformly with re- 
spect to Tt, 

1 

V2h 



(A.12) L(h) = - f = r {N u (h)-l}+fi + o p (l). 



Define L^h) = -fe.{N n (h) - 1}. 

Lemma A. 8. Under Assumptions 1~4 and Hq, as n — > oo, 
(Li (h),..., Li (hj)) T S Nj((31 j, Sj). 

Proof. According to the Cramer- Wold device, it suffices to show 
J 

(A.13) ^a^hJ-^NjicTpij,? Sj c) 

i=i 

for an arbitrary vector of constants c = (ci, . . . , cj) T . Without loss of gener- 
ality, we will only prove the case of J = 2. To apply Lemma A.l of Gao and 
King [25], we introduce the following notation. For i — 1,2, define di — 
and & = (X t ,X t+1 ), 



, (t £ s 1 f f e si (x,y)e ti (x,y) 
M ^ = ^JJ P (x,y)RHK) ^ dxd ^ 

2 T t-1 

4>st = 4>(£s,£t) = ^2di4>i(S, s ,£ t ) and Li(hi,h 2 ) = XIX]^*- 

i=l t=2s=l 

It is noted that for any given s,t > 1 and fixed x and y, E[4>(x,£t)] 
E[(f>(£ s ,y)] = 0. It suffices to verify 

(A. 14) max{M n , N n }h^ 2 as n — ► oo, 

where 

M n = maxi^M^^^^M^ 2 ^^)^^^^ 1 ^"^^^} 

iV n = max{n 3 / 2 M^ 1+5 » , n 3 / 2 M^ 2(1+5)) , n 3 ^ 2 , 

ra 3/2 M i/(2(l +5 )) )n 3/2 M i/(l + 5) }) 
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M n \ = max max 

l<i<j <k<n 



E\<t> ik i/>jk\ 1+5 , J \</>ikOjk\ 1+S dPtii) dPfe , &)}, 



M n21 = max max/^l^^fcl 2 ^), f \4Hk</>3k\ 2(1+6) dP(^dP(^ t 



M n 22 = max max 

l<i<j <k<n 



,2(1+5) 



M n3 
M n5 i 



max E\(j) ik (j) jk \ , 

l<s<j<Kn 



max 



max 



Ki,j,k<2n;i,j,k distinct ^ P 

l >ik<t>jk<l>ik<t>jk dP(t 



max maxs E 

l<i<j <k<n 



u4>jk\* 1+s > dP 

2(1+8) 



M n 52 = max max 

l<i<j<k<n 



(j>ik(fijk<l>ik4>jkdP(lii) 



2(1+5) 



dP(tj)dP{Z k )\, 



M, 



n6 



max E 

l<i<j <k<n 



ik<PjkdP{S,i) 



M n7 = max £[|<^-| 1+<5 ], 

l<i<j<n 

where the maximization in M n 4 is over the probability measures P(£i , £j, £j , £ k ) > 

^(6)^(^,0,^), ^(ei)^(eu)^te 2 ,e l3 ) and p^p^p^p^). 

Without confusion, we replace h\ by /i for simplicity. To verify the M n 
part of (A. 14), we verify only 



(A.15) 

Let q(x,y) = to(x,y)p~ 1 (x,y) and 



lim n 2 h- 2 M$ 1+ *> =0. 



^ = nh? J K{{X ~ X <)A)*"((j/ - X i+ i)/h)K((x - X^/h) 
x K((y - X j+1 )/h)q(x,y)dxdy 
for 1 <i < j <k <n. Direct calculation implies 



ipiki>jk = {nh 2 ) 2 / K 



x K 



h - K 

y - X k+ i 
h 



h 

l(x,y)K 



K 



h 



u — Xj \ ( v — X 



K 



Lj+i 
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x K {^—^-\ k( — * k+1 \ q(u, v) dx dy du dv 



where <% fc = ipik^jk - hjk and 

' Xi — Xu 



b ijk = n-tqiX^Xi+dqiXjtXj+jK® 

x K m ^ Xj ~ Xk ^j ( ^i±LZJ^tl \ K m ^ X j+i- X k+i ^ _ 

For any given 1 < £ < 2 and n sufficiently large, we may show that 
M n u<2(E[\b ijk \<] + E[\5 ijk f]) 
= 2E[\b ijk f](l + o(l)) 

C 

V h~ 

C 



(A.16) 



n K / \q(x,y)q(u,v)\ c - 



x p(x, y, u, v, z, w) dx dy du dv dz dw 

= C in ~ 2 <h\ 

where p(x,y,u,v,z,w) denotes the joint density of (Xi, Xi + i, Xj , Xj + i, X k , 
Xk+i) and C\ is a constant. Thus, as in oo, 

(A.17) n 2 h- 2 M l n ^ +S) = Cn 2 h-\n-V) 1 ^ = h^ 2 '^ - 0. 

Hence, (A.17) shows that (A. 15) holds for the first part of M n \. The proof 
for the second part of M n \ follows similarly. □ 

Proof of Theorem 1. From (A. 12) and Lemma A. 8, we have, under 
Ho, 

(L(hi), L(hj)) Nj(pij, Ej). 
Let Z = (Zi, . . . , Zj) T ~ Nj(f31j,Y,j). By the mapping theorem, under Hq, 
(A. 18) L n = maxL(h) -i- max Z k . 

h&H. l<k<J 

Hence, the theorem is established. □ 

Let lo a be the upper-a quantile of maxi<j<jZj. As the distribution of 
Nj((31k,T,j) is free of n, so is that of maxi<j<j Zi. And hence, lo a is a fixed 
quantity with respect to n. 

The following lemmas are required for the proof of Theorem 3. 
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Lemma A. 9. Under Assumptions 1 -4, for constants C and 7 £ (1/3,1/2), 
£{p(y\x) + C(ra/i 2 ) -7 } — > 00 in probability uniformly for (x,y) G 5. 

Proof. Let fl(x,y) = p(y\x) + C(n/i 2 )~ 7 and Qt,h(x,y) = w nw ,t(x) x 
i^/i(2/ — A^+i). Recall from the EL formulation given in Section 3 that 
^{AO>y)l = 2£™ =1 log[l + A(£,y){Q 4j/l (x,y)-/2(x,u)}], where, according to 
(3.4), X(x,y) satisfies 

Q _y^ Qt,h( x ,y) - K x ,y) 

^ 1 + A(x, y){Q t ,h(x, y) - fi{x, y)} ' 



Note that 



(A.19) 



w ^y' {Qt,h(x,y) -fl(x,y)} 2 

^ 1 l + X(x,y){Q t ,h(x,y)-fi(x,y)} 

n 

= ^l{Qt,h{x, y) - fl(x, y)}. 



t=i 

Let S 2 (x,y) = n _1 J2t=i{Qt,h(x,y) — fi(x,y)} 2 . From established results on 
the kernel estimator for a-mixing sequences, 

(A.20) S 2 h (x,y) = h- 2 R\K)p{y\x)p-\x) + O p {(nh 2 )-^'} + o p {h~ 2 ). 

Note that for a positive constant Mq and sufficiently large n, sup^^g^ \Qt,h(x, 
y) — fi(x,y)\ < h~ 2 Mo with probability one. Hence, (A.19) implies that 

n 

l ^Z{Qt,h{x,y) -jl(x,y)} 



n 

t=i 



\X(x,y)S 2 h (x,y)\ < {1 + \X(x, y)\h~ 2 M } 
This, along with (A.20) and the facts that 

n 

1 V) - Kx, y)} = O p {{nh 2 )- 1 / 2 log(n)} 



n 

t=i 



and p>(x,y) — p(y\x) = C(nh 2 ) 7 , implies 



(A.21) \(x,y) = O p {h 2 (nh 2 )-~ t } 

uniformly with respect to (x, y) G S. The rate of X(x, y) established in (A.21) 
allows us to carry out the Taylor expansion in (A.19) and obtain 

n 

(A.22) X(x, y) = S^ 2 (x, y)n~ l £{Q^(x, y) - jl(x, y)} + d p {h\nh 2 y 2 ^} . 

t=i 

At the same time, as AVs are continuous random variables, by applying the 
blocking technique and the Davydov inequality, it can be shown that, for 
any r\ > and (x, y) £ S, 



P(mm\Q tth (x,y) -fi{x,y)\ >rj)^0, 
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which, using (A. 19), implies 

(A.23) \H X >V)\ >Ch 2 (nh 2 )-^{l + d p {l)}. 

From (A.22) and (A.23), 

n 

£{Kx, y)} = 2 J2 + A ( x > y){Qt,k( x i v) - fifa y)}] 
t=i 

n 

= n\(x, y)n _1 ^{Q t>h {x, y) - fl(x, y)} 
t=i 

+ O p {nh 6 (nh 2 y 3 ~<} 

(A.24) 

= n\(x, y){C(nh 2 y~< + d p {(nh 2 )-^ 2 logn}} 

+ O p {h\nh 2 ) 1 -^} 
= O p {(nh 2 ) 1 - 2 ^}. 

As 7 G (1/3, 1/2), £{fl(x,y)} — ► oo in probability as n — ► oo. Since both 
(A. 21) and (A.24) are true uniformly with respect to (x,y) E S, the diver- 
gence of £{jl(x,y)} is uniform with respect to (x,y) £ S. □ 

Lemma A. 10. Under Assumptions 1~4 and H\, for any fixed real value 
x, as oo, P(L n > x) — > 1. 

Proof. From the established theory on EL, £{fi(x,y)} is a convex func- 
tion of n(x,y), the candidate for p(y\x). From the mean-value theorem and 
the facts that both £(■) and u> are nonnegative and £ is continuous in (x,y), 

N(h)>C s £{p § (yo\x )} 

for some (xo,yo) ^ ^ anc ^ > 0. 

Let /xi(xo,yo) = P${Vq\ x q)- By choosing C properly and for n large enough, 
we make sure that Li2(xo,yo) = P{yo\xo) + C{nh 2 )~ 1 falls within the interval 
of either (m(yo\x ),p(y \x )) or (p(y \x ), m(y \x )). Hence, there exists an 
a £ (0, 1) such that H2(xo,yo) = ap(yo\xo) + (1 — a)fii(yo\xo). The convexity 
of £(■) and the fact that £{p(yo\%o)} = lead to 

^2(xo,yo)} < (1 - a)e{m(x ,y )}. 

Since Lemma A. 9 implies that £{^2(xo,yo)} —> oo holds in probability 
with )U 2 (xo,yo) = K x o,yo), we have, as n — >■ oo, £{pg(yo\xo)} — >■ oo in prob- 
ability, which implies N(h) — > oo in probability as n — > oo. This means 
L(h) — > oo. The lemma is proved by noting P(L n > x) > P{L(hi) > x} for 
any i G {1, . . . , J}. □ 
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We now turn to the bootstrap EL test statistic N* (h) , which is a version of 
N(h) based on {X^}™^ 1 generated according to the parametric transitional 
density p^. Let p*(x,y) and pg(x,y) be the bootstrap versions of p(x,y) and 
p(x,y) respectively, and 9* be the maximum likelihood estimate based on 
the bootstrap sample. Then, the following expansion similar to (A. 2) is valid 
for N*(h): 

N*(h) = (nh 2 ) J J W^yfc ~\ u(x,y)dxdy + o p {h} 

(A.25) 

= N{(h) + N* 2§ (h) + N*Jh) + o p (h), 

where NJ(h) for j = 1,2 and 3 are the bootstrap versions of Nj(h), respec- 
tively. As the bootstrap resample is generated according to ps, the same 
arguments which led to Lemma 5 mean that iV|(/i) = N£(h) = 0. Thus, 
N*(h) = Nf(h) + 5 p (h), where 

And similar lemmas to Lemmas A. 6 and A. 7 can be established to study 
N*j(h) which are the bootstrap versions of N^(h), respectively. 

Proof of Theorem 2. It can be shown by taking the same route that 
establishes Lemma A. 8 that, as n — > oo and conditioning on {A^}™^ 1 , the 
distribution of (L*(hi), . . . ,L*(h j)) converges to Nj((31j, Sj) in probability, 
which readily imply the conclusion of the theorem. □ 

Let I* be the upper-a conditional quantile of L* n = max/, g ^ L* (h) given 

Proof of Theorem 3. Let l 0a be the upper-a quantile of maxi<j<j Zj. 
From Theorem 2, and due to the use of the parametric bootstrap, 



(A.26) l* a = l 0a + o p (l) 

under both Hq and H\. As L n maxi<j< j Zi, by the Slutsky theorem, 

P(L n > C) = P{L n + Op(l) > loa) P[ max T Zi >l 0a )= a, 



l<i<J 

which completes the first part of Theorem 3. The second part of Theorem 3 
is a direct consequence of Lemma A. 10 and (A.26). □ 
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